Elements of a plot
Additional components
Variable in the data is directly mapped to an element in the plot
glimpse(autism)
# Observations: 604
# Variables: 7
# $ childid <int> 1, 1, 1, 1, 1, 10, 10, 10, 10, 100, 100, 100, 100, 10...
# $ sicdegp <fctr> high, high, high, high, high, low, low, low, low, hi...
# $ age2 <dbl> 0, 1, 3, 7, 11, 0, 1, 7, 11, 0, 1, 3, 7, 0, 1, 7, 11,...
# $ vsae <int> 6, 7, 18, 25, 27, 9, 11, 18, 39, 15, 24, 37, 135, 8, ...
# $ gender <fctr> male, male, male, male, male, male, male, male, male...
# $ race <fctr> white, white, white, white, white, white, white, whi...
# $ bestest2 <fctr> pdd, pdd, pdd, pdd, pdd, autism, autism, autism, aut...
ggplot(autism, aes(x=age2, y=vsae)) +
geom_point()
How is the data mapped to graphical elements?
ggplot(autism, aes(x=age2, y=vsae)) +
geom_jitter()
How is the data mapped to graphical elements?
ggplot(autism, aes(x=age2, y=vsae)) +
geom_point() + geom_line()
ggplot(autism, aes(x=age2, y=vsae, group=childid)) +
geom_point() + geom_line()
ggplot(autism, aes(x=age2, y=vsae, group=childid)) +
geom_point() + geom_line(alpha=0.5)
ggplot(autism, aes(x=age2, y=vsae, group=childid)) +
geom_line(alpha=0.2) + theme_bw()
Now we can see that some individuals degrade, while most improve with time.
ggplot(autism, aes(x=age2, y=vsae, group=childid)) +
geom_line(alpha=0.2) + scale_y_log10() + theme_bw()
ggplot(autism, aes(x=age2, y=vsae, group=childid, colour=bestest2)) +
geom_line(alpha=0.3) + scale_y_log10() + theme_bw()
Now we can see a lot of overlap between the two groups.
ggplot(autism, aes(x=age2, y=vsae, colour=bestest2)) +
geom_point(alpha=0.1) + geom_line(aes(group=childid), alpha=0.1) +
geom_smooth(se=F) + scale_y_log10() + theme_bw()
ggplot(autism, aes(x=age2, y=vsae, colour=bestest2)) +
geom_point(alpha=0.1) + geom_line(aes(group=childid), alpha=0.1) +
geom_smooth(se=F, method=lm) + scale_y_log10() + theme_bw()
What do we learn about autism, age, and the diagnosis at age 2?
In terms of categorisation into either pdd or autism the vsae score is not distinct, but on average the autism diagnosis 2 year olds have lower scores. There a lot of overlap between ths group.
How is the data mapped to graphical elements?
ggplot(autism, aes(x=age2, y=vsae, colour=bestest2)) +
geom_boxplot() + scale_y_log10()
That’s not what I wanted ….
ggplot(autism, aes(x=factor(age2), y=vsae, colour=bestest2)) +
geom_boxplot() + scale_y_log10()
p1 <- ggplot(autism, aes(x=age2, y=vsae, colour=bestest2)) +
geom_point(alpha=0.1) + geom_line(aes(group=childid), alpha=0.1) +
geom_smooth(se=F) + scale_y_log10() + theme(legend.position="none")
p2 <- ggplot(autism, aes(x=factor(age2), y=vsae, colour=bestest2)) +
geom_boxplot() + scale_y_log10() + theme(legend.position="none")
grid.arrange(p1, p2, ncol=2)
41% Of Fliers Think You’re Rude If You Recline Your Seat
fly <- read_csv("./data/flying-etiquette.csv")
glimpse(fly)
# Observations: 1,040
# Variables: 27
# $ RespondentID <dbl> ...
# $ How often do you travel by plane? <chr> ...
# $ Do you ever recline your seat when you fly? <chr> ...
# $ How tall are you? <int> ...
# $ Do you have any children under 18? <chr> ...
# $ In a row of three seats, who should get to use the two arm rests? <chr> ...
# $ In a row of two seats, who should get to use the middle arm rest? <chr> ...
# $ Who should have control over the window shade? <chr> ...
# $ Is itrude to move to an unsold seat on a plane? <chr> ...
# $ Generally speaking, is it rude to say more than a few words tothe stranger sitting next to you on a plane? <chr> ...
# $ On a 6 hour flight from NYC to LA, how many times is it acceptable to get up if you're not in an aisle seat? <chr> ...
# $ Under normal circumstances, does a person who reclines their seat during a flight have any obligation to the person sitting behind them? <chr> ...
# $ Is itrude to recline your seat on a plane? <chr> ...
# $ Given the opportunity, would you eliminate the possibility of reclining seats on planes entirely? <chr> ...
# $ Is it rude to ask someone to switch seats with you in order to be closer to friends? <chr> ...
# $ Is itrude to ask someone to switch seats with you in order to be closer to family? <chr> ...
# $ Is it rude to wake a passenger up if you are trying to go to the bathroom? <chr> ...
# $ Is itrude to wake a passenger up if you are trying to walk around? <chr> ...
# $ In general, is itrude to bring a baby on a plane? <chr> ...
# $ In general, is it rude to knowingly bring unruly children on a plane? <chr> ...
# $ Have you ever used personal electronics during take off or landing in violation of a flight attendant's direction? <chr> ...
# $ Have you ever smoked a cigarette in an airplane bathroom when it was against the rules? <chr> ...
# $ Gender <chr> ...
# $ Age <chr> ...
# $ Household Income <chr> ...
# $ Education <chr> ...
# $ Location (Census Region) <chr> ...
Mix of categorical and quantiative variables. What mappings are appropriate? Area for counts of categories, side-by-side boxplots for mixed pair.
ggplot(fly, aes(x=`How often do you travel by plane?`)) +
geom_bar() + coord_flip()
Categories are not sorted
fly$`How often do you travel by plane?` <-
factor(fly$`How often do you travel by plane?`, levels=c(
"Never","Once a year or less","Once a month or less",
"A few times per month","A few times per week","Every day"))
ggplot(fly, aes(x=`How often do you travel by plane?`)) +
geom_bar() + coord_flip()
fly_sub <- fly %>% filter(`How often do you travel by plane?` %in%
c("Once a year or less",
"Once a month or less")) %>%
filter(!is.na(`Do you ever recline your seat when you fly?`)) %>%
filter(!is.na(Age)) %>% filter(!is.na(Gender))
fly_sub$`Do you ever recline your seat when you fly?` %>% unique()
# [1] "About half the time" "Usually" "Always"
# [4] "Once in a while" "Never"
fly_sub$`Do you ever recline your seat when you fly?` <- factor(
fly_sub$`Do you ever recline your seat when you fly?`, levels=c(
"Never","Once in a while","About half the time",
"Usually","Always"))
ggplot(fly_sub, aes(y=`How tall are you?`,
x=`Do you ever recline your seat when you fly?`)) +
geom_boxplot() #+ coord_flip()
Take a look at the ggplot2 Cheat sheet
How many geoms are available in ggplot2? What is geom_rug?
p <- ggplot(autism, aes(x=age2, y=vsae))
p1 <- p + geom_point() + coord_flip()
p2 <- p + geom_point() + geom_rug() + coord_flip()
p3 <- p + geom_point() + geom_rug(position='jitter') + coord_flip()
grid.arrange(p1, p2, p3, nrow=3)
What is the difference between colour and fill?
Colour is for 0 or 1-dimensional elements, and fill is for area (2-d) geoms
What does coord_fixed() do? What is the difference between this and using theme(aspect.ratio=...)?
p <- ggplot(autism, aes(x=age2, y=vsae))
p1 <- p + geom_point() + coord_fixed(ratio = 1)
p2 <- p + geom_point() + theme(aspect.ratio = 1)
grid.arrange(p1, p2, ncol=2)
coord_fixed operates on the raw data values, but theme(aspect_ratio=...) works on the plot dimensions.
What are scales? How many numeric transformation scales are there?
scales do the transformation between data values and graphical element value. most often it is applied to position along x, y which is common, to log or sqrt, .. there are 3 numeric transformations.
What are position adjustments? When would they be used?
positions shift the location some from original coordinates. most often used with bar charts to stack, or put side-by-side
Use your cheat sheet to work out how to make a plot to explore the relationship between
Do you ever recline your seat when you fly? and Is it rude to recline your seat on a plane?
unique(fly_sub$`Is itrude to recline your seat on a plane?`)
# [1] "Yes, somewhat rude" "No, not rude at all" "Yes, very rude"
unique(fly_sub$`Do you ever recline your seat when you fly?`)
# [1] About half the time Usually Always
# [4] Once in a while Never
# Levels: Never Once in a while About half the time Usually Always
ggplot(fly_sub, aes(x=`Do you ever recline your seat when you fly?`)) +
geom_bar() +
facet_wrap(~`Is itrude to recline your seat on a plane?`, ncol=3) +
coord_flip()
ggplot(fly_sub, aes(x=`Do you ever recline your seat when you fly?`,
fill=`Is itrude to recline your seat on a plane?`)) +
geom_bar()
ggplot(fly_sub, aes(x=`Do you ever recline your seat when you fly?`,
fill=`Is itrude to recline your seat on a plane?`)) +
geom_bar(position="dodge")
ggplot(fly_sub,
aes(x=`In general, is itrude to bring a baby on a plane?`)) +
geom_bar() + coord_flip() + facet_wrap(~Gender)
fly_sub$Age <- factor(fly_sub$Age,
levels=c("18-29","30-44","45-60","> 60"))
ggplot(fly_sub,
aes(x=`In general, is itrude to bring a baby on a plane?`)) +
geom_bar() + coord_flip() + facet_grid(Age~Gender)
p <- ggplot(fly_sub,
aes(x=`In general, is itrude to bring a baby on a plane?`,
fill=Gender)) +
geom_bar(position="fill") + coord_flip() +
facet_wrap(~Age, ncol=5)
p
p + scale_fill_brewer(palette="Dark2")
What it looks like to a color-blind:
library(scales)
library(dichromat)
p1 <- p + theme(legend.position = "none")
clrs <- hue_pal()(3)
clrs <- dichromat(clrs)
p2 <- p + scale_fill_manual("", values=clrs) +
theme(legend.position = "none")
grid.arrange(p1, p2)
Can you find the odd one out?
df <- data.frame(x=runif(100), y=runif(100),
cl=sample(c(rep("A", 1), rep("B", 99))))
ggplot(data=df, aes(x, y, shape=cl)) + theme_bw() +
geom_point() + theme(legend.position="None", aspect.ratio=1)
Is it easier now?
ggplot(data=df, aes(x, y, colour=cl)) +
geom_point() + theme_bw() +
theme(legend.position="None", aspect.ratio=1)
library(RColorBrewer)
display.brewer.all()
ggplot(fly_sub, aes(x=`In general, is itrude to bring a baby on a plane?`,
fill=Gender)) +
geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5)
With this arrangement we can see proportion of gender within each rudeness category, and compare these across age groups. How could we arrange this differently?
ggplot(fly_sub, aes(x=Gender,
fill=`In general, is itrude to bring a baby on a plane?`)) +
geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5) +
theme(legend.position="bottom")
ggplot(fly_sub, aes(x=Gender,
fill=`In general, is itrude to bring a baby on a plane?`)) +
geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5) + theme(legend.position="bottom")
What is different about the comparison now?
ggplot(fly_sub, aes(x=Age,
fill=`In general, is itrude to bring a baby on a plane?`)) +
geom_bar(position="fill") + coord_flip() + facet_wrap(~Gender, ncol=5) +
theme(legend.position="bottom")
The ggthemes package has many different styles for the plots. Other packages such as xkcd, skittles, wes anderson, beyonce, ….
ggplot(fly_sub, aes(x=Gender,
fill=`In general, is itrude to bring a baby on a plane?`)) +
geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5) +
theme_xkcd() + theme(legend.position="bottom")
See the vignette for instructions on installing the xkcd font.
ggplot(fly_sub, aes(x=Gender,
fill=`In general, is itrude to bring a baby on a plane?`)) +
geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5) +
theme_xkcd() + theme(legend.position="bottom")